14 research outputs found

    Adaptive firefly algorithm for hierarchical text clustering

    Get PDF
    Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

    Blogs Search Engine Adopting RSS Syndication Using Fuzzy Logic

    Get PDF
    The rapid development of Internet increases the writers of blog sites. Sometimes these blog sites focused on solving some important problems. To find specific blogs are hard problem for the users because a lot of these blogs contain unuseful information such as online advertisements, notice and noise which minimize the rank of blog site. Furthermore to retrieve more relevant blogs is another problem which lowering the search performance. This study proposes blogs search engine adopting RSS syndication using Fuzzy logic. The blogs search engine consists of three main phases which are crawling using RSS feeds algorithm, indexing weblogs algorithm and searching technique with Fuzzy logic. In RSS crawling process RSS feeds need to be gathered to extract useful information such as title, links, publish time and description. Indexing weblogs use the links to retrieve the blogs sites for text processing and construct indexing database. In order to retrieve such information needed by any user, there is user interface to search for keyword with importance degree and compute the density of keyword from the indexing database. The rank of the pages is computed based on fuzzy weighted average value. A prototype is built using visual basic 2008 to validate the proposed blogs search engine. It is a windows application with http connection protocol. In system evaluation used two measurement performances which are precision and mean average precision. The parameters of precision determine based on respondents whom determine the total retrieved links and the total relevant links for the keyword search result. The number of keywords that used in testing system is five pairs keywords. The experimental results show that the mean average precision is 81.7% of the whole system performance. The percent of respondents is 80% who knows and uses the blogs and 20% don’t have knowledge. The execution time of the system based on respondents is 70% between 3-5 minute and 30% less than 3 minute. This percentage is good considering the rate of satisfaction for system is 80% satisfied and 20% strongly satisfied

    Document clustering for knowledge discovery using nature-inspired algorithm

    Get PDF
    As the internet is overload with information, various knowledge based systems are now equipped with data analytics features that facilitate knowledge discovery.This includes the utilization of optimization algorithms that mimics the behavior of insects or animals.This paper presents an experiment on document clustering utilizing the Gravitation Firefly algorithm (GFA).The advantage of GFA is that clustering can be performed without a pre-defined value of k clusters.GFA determines the center of clusters by identifying documents with high force.Upon identification of the centers, clusters are created based on cosine similarity measurement.Experimental results demonstrated that GFA utilizing a random positioning of documents outperforms existing clustering algorithm such as Particles Swarm Optimization (PSO) and K-means

    Document clustering based on firefly algorithm

    Get PDF
    Document clustering is widely used in Information Retrieval however, existing clustering techniques suffer from local optima problem in determining the k number of clusters.Various efforts have been put to address such drawback and this includes the utilization of swarm-based algorithms such as particle swarm optimization and Ant Colony Optimization.This study explores the adaptation of another swarm algorithm which is the Firefly Algorithm (FA) in text clustering.We present two variants of FA; Weight- based Firefly Algorithm (WFA) and Weight-based Firefly Algorithm II (WFAII).The difference between the two algorithms is that the WFAII, includes a more restricted condition in determining members of a cluster.The proposed FA methods are later evaluated using the 20Newsgroups dataset.Experimental results on the quality of clustering between the two FA variants are presented and are later compared against the one produced by particle swarm optimization, K-means and the hybrid of FA and -K-means. The obtained results demonstrated that the WFAII outperformed the WFA, PSO, K-means and FA-Kmeans. This result indicates that a better clustering can be obtained once the exploitation of a search solution is improved

    GF-CLUST: A nature-inspired algorithm for automatic text clustering

    Get PDF
    Text clustering is a task of grouping similar documents into a cluster while assigning the dissimilar ones in other clusters.A well-known clustering method which is the K-means algorithm is extensively employed in many disciplines.However, there is a big challenge to determine the number of clusters using K-means. This paper presents a new clustering algorithm, termed Gravity Firefly Clustering (GF-CLUST) that utilizes Firefly Algorithm for dynamic document clustering. The GF-CLUST features the ability of identifying the appropriate number of clusters for a given text collection, which is a challenging problem in document clustering. It determines documents having strong force as centers and creates clusters based on cosine similarity measurement.This is followed by selecting potential clusters and merging small clusters to them. Experiments on various document datasets, such as 20 Newgroups, Reuters-21578 and TREC collection are conducted to evaluate the performance of the proposed GF-CLUST. The results of purity, F-measure and Entropy of GF-CLUST outperform the ones produced by existing clustering techniques, such as K-means, Particle Swarm Optimization (PSO) and Practical General Stochastic Clustering Method (pGSCM).Furthermore, the number of obtained clusters in GF-CLUST is near to the actual number of clusters as compared to pGSCM

    Weight-based firefly algorithm for document clustering

    Get PDF
    Existing clustering techniques have many drawbacks and this includes being trapped in a local optima. In this paper, we introduce the utilization of a new meta-heuristics algorithm, namely the Firefly algorithm (FA) to increase solution diversity. FA is a nature-inspired algorithm that is used in many optimization problems.The FA is realized in document clustering by executing it on Reuters-21578 database.The algorithm identifies documents that has the highest light intensity in a search space and represents it as a centroid.This is followed by recognizing similar documents using the cosine similarity function.Documents that are similar to the centroid are located into one cluster and dissimilar in the other.Experiments performed on the chosen dataset produce high values of Purity and F-measure.Hence, suggesting that the proposed Firefly algorithm is a possible approach in document clustering

    Optimal robot path planning using enhanced particle swarm optimization algorithm

    Get PDF
    The aim of robot path planning is to search for a safe path for the mobile robot. Even though there exist various path planning algorithms for mobile robots, yet only a few are optimized. The optimized algorithms include the Particle Swarm Optimization (PSO) that finds the optimal path with respect to avoiding the obstacles while ensuring safety. In PSO, the sub-optimal solution takes place frequently while finding a solution to the optimal path problem. This paper proposes an enhanced PSO algorithm that contains an improved particle velocity. Experimental results show that the proposed Enhanced PSO performs better than the standard PSO in terms of solution’s quality. Hence, a mobile robot implementing the proposed algorithm operates better and is more secure

    Fireflyclust: an automated hierarchical text clustering approach

    Get PDF
    Text clustering is one of the text mining tasks that is employed in search engines. Discovering the optimal number of clusters for a dataset or repository is a challenging problem. Various clustering algorithms have been reported in the literature but most of them rely on a pre-defined value of the k clusters. In this study, a variant of Firefly algorithm, termed as FireflyClust, is proposed to automatically cluster text documents in a hierarchical manner. The proposed clustering method operates based on five phases: data pre-processing, clustering, item re-location, cluster selection and cluster refinement. Experiments are undertaken based on different selections of threshold value. Results on the TREC collection named TR11, TR12, TR23 and TR45, showed that the FireflyClust is a better approach than the Bisect K-means, hybrid Bisect K-means and Practical General Stochastic Clustering Method. Such a result would enlighten the directions in developing a better information retrieval engine for this dynamic and fast growing big data era

    Nature inspired data mining algorithm for document clustering in information retrieval

    No full text
    Document clustering is an important technique that has been widely employed in Information Retrieval (IR). Various clustering techniques have been reported, but the effectiveness of most techniques relies on the initial value of k clusters.Such an approach may not be suitable as we may not have prior knowledge on the collection of documents.To date, there are various swarm based clustering techniques proposed to address such problem, including this paper that explores the adaptation of Firefly Algorithm (FA) in document clustering. We extend the work on Gravitation Firefly Algorithm (GFA) by introducing a relocate mechanism that relocates assigned documents, if necessary. The newly proposed clustering algorithm, known as GFA_R, is then tested on a benchmark dataset obtained from the 20Newsgroups. Experimental results on external and relative quality metrics for the GFA_R is compared against the one obtained using the standard GFA and Bisect K-means.It is learned that by extending GFA to becoming GFA_R, a better quality clustering is obtained

    Discovering optimal clusters using firefly algorithm

    No full text
    Existing conventional clustering techniques require a pre-determined number of clusters, unluckily; missing information about real world problem makes it a hard challenge.A new orientation in data clustering is to automatically cluster a given set of items by identifying the appropriate number of clusters and the optimal centre for each cluster.In this paper, we present the WFA_selection algorithm that originates from weight-based firefly algorithm.The newly proposed WFA_selection merges selected clusters in order to produce a better quality of clusters.Experiments utilising the WFA and WFA_selection algorithms were conducted on the 20Newsgroups and Reuters-21578 benchmark dataset and the output were compared against bisect K-means and general stochastic clustering method (GSCM).Results demonstrate that the WFA_selection generates a more robust and compact clusters as compared to the WFA, bisect K-means and GSCM
    corecore